NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Querying Templatized Document Collections with Large Language Models

https://doi.org/10.1109/ICDE65448.2025.00183

Lin, Yiming; Hulsebos, Madelon; Ma, Ruiying; Shankar, Shreya; Zeighami, Sepanta; Parameswaran, Aditya G; Wu, Eugene (May 2025, IEEE)

Free, publicly-accessible full text available May 19, 2026
Building Reactive Large Language Model Pipelines with Motion

https://doi.org/10.1145/3626246.3654734

Shankar, Shreya; Parameswaran, Aditya G (June 2024, ACM)

Full Text Available
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences

https://doi.org/10.1145/3654777.3676450

Shankar, Shreya; Zamfirescu-Pereira, JD; Hartmann, Bjoern; Parameswaran, Aditya; Arawjo, Ian (October 2024, ACM)

Full Text Available
"We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning

https://doi.org/10.1145/3653697

Shankar, Shreya; Garcia, Rolando; Hellerstein, Joseph M; Parameswaran, Aditya G (April 2024, Proceedings of the ACM on Human-Computer Interaction)

Organizations rely on machine learning engineers (MLEs) to deploy models and maintain ML pipelines in production. Due to models' extensive reliance on fresh data, the operationalization of machine learning, or MLOps, requires MLEs to have proficiency in data science and engineering. When considered holistically, the job seems staggering---how do MLEs do MLOps, and what are their unaddressed challenges? To address these questions, we conducted semi-structured ethnographic interviews with 18 MLEs working on various applications, including chatbots, autonomous vehicles, and finance. We find that MLEs engage in a workflow of (i) data preparation, (ii) experimentation, (iii) evaluation throughout a multi-staged deployment, and (iv) continual monitoring and response. Throughout this workflow, MLEs collaborate extensively with data scientists, product stakeholders, and one another, supplementing routine verbal exchanges with communication tools ranging from Slack to organization-wide ticketing and reporting systems. We introduce the 3Vs of MLOps: velocity, visibility, and versioning --- three virtues of successful ML deployments that MLEs learn to balance and grow as they mature. Finally, we discuss design implications and opportunities for future work.
more » « less
Full Text Available
Revisiting Prompt Engineering via Declarative Crowdsourcing

Parameswaran, Aditya; Shankar, Shreya; Asawa, Parth; Jain, Naman; Wang, Yujie (January 2024, CIDR)

Full Text Available
Automatic and Precise Data Validation for Machine Learning

https://doi.org/10.1145/3583780.3614786

Shankar, Shreya; Fawaz, Labib; Gyllstrom, Karl; Parameswaran, Aditya (October 2023, ACM)

Full Text Available
spade: Synthesizing Data Quality Assertions for Large Language Model Pipelines

https://doi.org/10.14778/3685800.3685835

Shankar, Shreya; Li, Haotian; Asawa, Parth; Hulsebos, Madelon; Lin, Yiming; Zamfirescu-Pereira, J D; Chase, Harrison; Fu-Hinthorn, Will; Parameswaran, Aditya G; Wu, Eugene (August 2024, Proceedings of the VLDB Endowment)

Large language models (LLMs) are being increasingly deployed as part of pipelines that repeatedly process or generate data of some sort. However, a common barrier to deployment are the frequent and often unpredictable errors that plague LLMs. Acknowledging the inevitability of these errors, we proposedata quality assertionsto identify when LLMs may be making mistakes. We present spade, a method for automatically synthesizing data quality assertions that identify bad LLM outputs. We make the observation that developers often identify data quality issues during prototyping prior to deployment, and attempt to address them by adding instructions to the LLM prompt over time. spade therefore analyzes histories of prompt versions over time to create candidate assertion functions and then selects a minimal set that fulfills both coverage and accuracy requirements. In testing across nine different real-world LLM pipelines, spade efficiently reduces the number of assertions by 14% and decreases false failures by 21% when compared to simpler baselines. spade has been deployed as an offering within LangSmith, LangChain's LLM pipeline hub, and has been used to generate data quality assertions for over 2000 pipelines across a spectrum of industries.
more » « less
Full Text Available
Towards Observability for Production Machine Learning Pipelines

https://doi.org/10.14778/3565838.3565853

Shankar, Shreya; Parameswaran, Aditya G. (September 2022, Proceedings of the VLDB Endowment)

Software organizations are increasingly incorporating machine learning (ML) into their product offerings, driving a need for new data management tools. Many of these tools facilitate the initial development of ML applications, but sustaining these applications post-deployment is difficult due to lack of real-time feedback (i.e., labels) for predictions and silent failures that could occur at any component of the ML pipeline (e.g., data distribution shift or anomalous features). We propose a new type of data management system that offers end-to-end observability , or visibility into complex system behavior, for deployed ML pipelines through assisted (1) detection, (2) diagnosis, and (3) reaction to ML-related bugs. We describe new research challenges and suggest preliminary solution ideas in all three aspects. Finally, we introduce an example architecture for a "bolt-on" ML observability system, or one that wraps around existing tools in the stack.
more » « less
Full Text Available

Search for: All records